Figures for journals tend to have a couple of common features:
Many of us do some of these things within the script that makes the individual panels and then finish assembling the final product in another tool like PowerPoint, Illustrator, or the open source Inkscape. My goal here is to demonstrate some tricks for academic figure making that elevate the final product and make iterating on figure design easier.
Where do you present your data? How do your plots differ between the first time they show up in lab meeting and your final manuscript? How many versions happened in between? Why did you make changes?
This tutorial/post is geared toward data graphics appearing in academic journals, which can look very different from the versions you’d show in a lab meeting or conference presentation.
I generally recommend separating the code that generates and prepares your data from the code that makes your plots. Import your prepared data at the top of your plotting script instead. This saves headache in searching for your plotting code, and encourages separate scripts for making poster or presentation graphics from manuscript figures. They’re just too different. :)
R has what we call “base” graphics, which are totally fine. They get the job done, and can make exploratory data analysis super quick (e.g. hist(data$x)).
ggplot2 provides an enormous library of plot types and customization options, all based on the grammar of graphics. Imagine an old school classroom projector with transparency sheets. You can consider ggplot’s “geoms” like individual transparency sheets that you can stack to make a final plot. Stack a fit line on top of a scatter plot with geom_scatter() + geom_abline().
We’ll start with the classic iris dataset.
library(tidyverse) # includes the ggplot2 package
library(skimr)
skim(iris) %>% skimr::kable()
Skim summary statistics
n obs: 150
n variables: 5
Variable type: factor
| variable | missing | complete | n | n_unique | top_counts | ordered |
|---|---|---|---|---|---|---|
| Species | 0 | 150 | 150 | 3 | set: 50, ver: 50, vir: 50, NA: 0 | FALSE |
Variable type: numeric
| variable | missing | complete | n | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Petal.Length | 0 | 150 | 150 | 3.76 | 1.77 | 1 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▁▂▅▅▃▁ |
| Petal.Width | 0 | 150 | 150 | 1.2 | 0.76 | 0.1 | 0.3 | 1.3 | 1.8 | 2.5 | ▇▁▁▅▃▃▂▂ |
| Sepal.Length | 0 | 150 | 150 | 5.84 | 0.83 | 4.3 | 5.1 | 5.8 | 6.4 | 7.9 | ▂▇▅▇▆▅▂▂ |
| Sepal.Width | 0 | 150 | 150 | 3.06 | 0.44 | 2 | 2.8 | 3 | 3.3 | 4.4 | ▁▂▅▇▃▂▁▁ |
This is a basic scatterplot using the ggplot2 defaults.
scatter_plot <- iris %>%
ggplot(aes(x = Petal.Width, y = Petal.Length, color = Species)) +
geom_point()
scatter_plot
This looks alright, but we can do better! What would you change before including this in your manuscript?
If you load the cowplot package on its own (or after loading ggplot or the tidyverse), you’ll get a version of ggplot with more “academically oriented” default settings. In my view, this is a lot faster going the DIY route (although you totally can!). Let’s try that 1st plot again.
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
better_scatter <- iris %>%
ggplot(aes(x = Petal.Width, y = Petal.Length, color = Species)) +
geom_point()
better_scatter
Does this look better? cowplot has several other functions that allow for customization of your figure, like adding a background grid. Standard ggplot syntax still works for making other changes.
better_scatter <- better_scatter +
labs(y = "Petal length (cm)", x = "Petal width (cm)") +
scale_color_brewer(
type = "qual",
palette = "Dark2", # from ColorBrewer
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme(legend.text = element_text(face = "italic"))
better_scatter
Let’s try making a few other plots, so we can play around with combining them into a multi-panel figure.
box_plot <- iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot(outlier.shape = NA) + # otherwise outliers will show as black dots
theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) + # Useful for narrow plots
labs(y = "Sepal Length (cm)")
box_plot
Ok, that looks fine, better than a barplot. But! For the same amount of space - we can show more data by overlaying the actual observations on top of the boxplots. Let’s also keep the color scheme going from the first plot.
better_box <- box_plot +
geom_jitter(width = 0.2, aes(color = Species), alpha = 0.7) + # alpha makes points semi-transparent
scale_x_discrete(labels = c("I. setosa", "I. versicolor", "I. virginica")) +
scale_color_brewer(
type = "qual",
palette = "Dark2"
) + # Colors match the scatterplot because Species is a factor
theme(axis.text.x = element_text(face = "italic")) +
guides(color = FALSE) # Hiding the legend because it doesn't communicate anything about the data
better_box
Using the ggpubr package, we can also add comparisons across groups. You can plot either the actual p-value, or use the *** notation:
we use the following convention for symbols indicating statistical significance: ns: p > 0.05, *: p <= 0.05, **: p <= 0.01, ***: p <= 0.001, ****: p <= 0.0001 (
ggpubrdocs)
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
##
## get_legend
anova_comparisons <- list(
c("setosa", "versicolor"),
c("versicolor", "virginica"),
c("setosa", "virginica")
)
better_box <- better_box +
# stat_compare_means(label.y = 10) + #adds the global ANOVA p-value, at y = 10
stat_compare_means(comparisons = anova_comparisons, label = "p.signif") + # adds the pair-wise comparisons
scale_y_continuous(expand = expand_scale(mult = c(0, .1))) # adds a little padding to the y-axis
better_box
The ggsave function let’s you save the last generated plot (or whatever one you specify) as vector graphics like SVG or PDF (good for importing into Illustrator or Inkscape), or raster graphics like TIF, JPG, or PNG. You can specify the dpi (raster only) and size of the image.
ggsave(scatter_plot, filename = "figs/initial_scatterplot.svg", width = 6, height = 6, units = "in")
ggsave(scatter_plot, filename = "figs/initial_scatterplot.jpg", width = 6, height = 6, units = "in", dpi = 300)
Quick tip: if you play around with the size of the Plots window in RStudio, you can find the size that you like and then click “Save as image” to find out the size of the plot. Use those dimensions in your ggsave call, and you’ll have an easier time recreating the plot if you later decide to change something.
The quick way to label your graphs is with ggplot’s labs(title = "A"). But! This aligns “A” with the top of the y-axis. That’s ok, but maybe you want to align the title with the leftmost edge of the y-axis labels. Here’s how to do that using gridExtra (credit: Stack Overflow).
library(grid)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
p <- iris %>%
ggplot(aes(x = Petal.Width, y = Petal.Length, color = Species)) +
geom_point() +
scale_color_brewer(
type = "qual",
palette = "Dark2",
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme(legend.text = element_text(face = "italic"))
title.grob <- textGrob(
label = "Panel A",
x = unit(0, "lines"),
y = unit(0, "lines"),
hjust = 0, vjust = 0,
gp = gpar(fontsize = 16)
)
p1 <- arrangeGrob(p, top = title.grob) # layer the title on top of the subpanel
grid.arrange(p1) # prints the layered plot to the display
Quick tip: if you need to put superscripts, subscripts, or Greek letters in your axis labels, check out this blog post from Tyler Rinker for several examples.
Once you have your sub-panels generated and labeled, you can arrange them into a figure using several packages. There are lots of options here:
arrangeGrob() from gridExtraplot.grid() from cowplot (will do the labeling for you!)patchwork package from Thomas Lin Pedersenggarange from eggI recommend choosing the easiest option that still lets you achieve your plotting dreams. patchwork is very easy to use, but not as fully featured as egg or gridExtra. gridExtra can be a little tricky to use, and is not quite as flexible as egg. cowplot is fairly simple and has defaults that are designed for academic figure making.
Here’s the default behavior of gridExtra - which might send you running to StackOverflow…
grid.arrange(better_scatter, better_box, ncol = 2) # again, this prints to display
better_scatter_box <- arrangeGrob(p1, better_box, ncol = 2) # make a grob that includes both plots
ggsave(plot = better_scatter_box, filename = "figs/double_plot.png", device = "png")
## Saving 7 x 3.5 in image
Cringing a little? Me too. Let’s fix this quickly, using cowplot’s plot_grid.
plot_grid(better_box,
better_scatter,
labels = c("A", "B"), # note: there are LOTS
align = "h", # of options for customization
rel_widths = c(1, 2) # beyond these here
)
Get your rulers out, the plots are actually aligned!
Let’s say we need to include image files in our plot as panels C-E, maybe representative images of those three iris species (image credits: Plant World Seeds. We can also do that with cowplot (requies the magick package)!
Note: the positioning of the flower species labels varies by panel because it’s based on the center of the text string. The default, 0.5, indicates the dead center of the plot. Go more negative to move to the right, and more positive to move to the left. This is fiddly, and final positioning will depend on the font size and the final figure size. If you’re working in an RStudio Notebook, click the “Show in New Window” button after running the code chunk to see how the plot actually looks at full size.
library(magick)
## Linking to ImageMagick 6.9.9.39
## Enabled features: cairo, fontconfig, freetype, lcms, pango, rsvg, webp
## Disabled features: fftw, ghostscript, x11
setosa <- ggdraw() +
draw_image("figs/iris_setosa.jpg", scale = 0.8) +
draw_text("Iris setosa", fontface = "bold.italic", color = "white", size = 12, hjust = -0.2, vjust = 8)
virginica <- ggdraw() +
draw_image("figs/iris_virginica.JPG", scale = 0.8) +
draw_text("Iris virginica", fontface = "bold.italic", color = "white", size = 12, hjust = 0, vjust = 8)
versicolor <- ggdraw() +
draw_image("figs/iris_versicolor.JPG", scale = 0.8) +
draw_text("Iris versicolor", fontface = "bold.italic", color = "white", size = 12, hjust = 0.1, vjust = 8)
first_row <- plot_grid(better_box,
better_scatter,
labels = c("A", "B"),
ncol = 2,
align = "h",
rel_widths = c(1, 2),
vjust = 1, # moves the panel label up or down
hjust = -0.25 # moves the panel label R/L
)
second_row <- plot_grid(setosa,
versicolor,
virginica,
labels = c("C", "D", "E"),
ncol = 3,
vjust = 3
)
complete_figure <- plot_grid(first_row,
second_row,
labels = NULL,
ncol = 1
)
complete_figure
ggsave(filename = "figs/iris_multipanel.png", plot = complete_figure)
## Saving 7.5 x 6 in image
Image credit: the egg package vignette
If you’ve ever “ungrouped” the SVG or PDF version of a ggplot - you’ve encountered the grid. The image above, from Baptiste Auguié’s powerful egg package, shows the layout of a ggplot. When you’re having trouble getting an annotation (using geom_text()) to fit onto your final image, often the problem is that you’ve run out of whitespace around your plot for the annotation. You can increase the whitespace by expanding the margins. There are ways to do this in both base R graphics and ggplot. Here’s a ggplot example that cleans up the look of our last multi-panel plot.
better_box <- better_box +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines")) # top, right, bottom, left
better_scatter <- better_scatter +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
setosa <- setosa +
theme(plot.margin = unit(c(0.5, 0, 0, 0), "lines"))
virginica <- virginica +
theme(plot.margin = unit(c(0.5, 0, 0, 0), "lines"))
versicolor <- versicolor +
theme(plot.margin = unit(c(0.5, 0, 0, 0), "lines"))
first_row <- plot_grid(better_box,
better_scatter,
labels = c("A", "B"),
ncol = 2,
align = "h",
rel_widths = c(1, 2)
) # Note that we don't need the vjust and hjust from cowplot anymore
second_row <- plot_grid(setosa,
versicolor,
virginica,
labels = c("C", "D", "E"),
ncol = 3
)
complete_figure2 <- plot_grid(first_row,
second_row,
labels = NULL,
ncol = 1
)
complete_figure2
As for text sizes - using a package that allows you to combine plots within R is the easiest way to stay consistent. Outputting your final figure in the journal’s preferred size can reduce the chance that your font sizes will end up too small.
The above covered the mechanics of academic plot making, so now let’s spend a little time on the art. This is a huge topic, with tons of great books (see the end of this page for a few). Assuming your journal of choice doesn’t charge extra for color, or force you to use different types of cross-hatching instead (looking at you Molecular Psychiatry), you have options!
Consider differences in perception
Beyond being attentive to color blindness (up to 8% of men with Northern European ancestry), don’t let your colors get too light or use too many categories as it becomes very hard to distinguish differences. The ColorBrewer site makes this easy, and the palettes can be used in ggplot with scale_color_brewer() (or scale_fill_brewer() if fill is the aesthetic that your data is mapped to).
Divergent vs sequential
Should your color scheme be divergent (e.g. purple > white > green) or sequential (dark green > light green > white)? Think about what your middle value represents - is it 0, or a critical middle ground? If so, use a diverging palette. If not, use a sequential palette. Please don’t confuse your reader by plotting data that ranges from 1 to 20 using colors that go from dark blue to red, passing by yellow in the middle, unless you make it clear that your mid-point (10) defines something specific/important.
Readers assume continuity
If you have 5 figures in your paper, and use a divering color scale from purple to green for one type of experiment (e.g. upregulation and downregulation of genes), try not to use that same set of colors for a different type of experiment later on (e.g. to refer to different genotypes of mice).
Keep categorical coloring consistent across your paper, and try not to use sequential colors when the categories are not sequential. It’s fine to use shades of red for 5 different quantiles of a variable, but confusing to use them for 5 different brain regions or species of frog.
Reduce the chance that your reader will go “Oh wait, that’s not…” when they move from one figure to the next.
If it won’t cost you $1000s in grant money for color figure charges, have fun with color! Studying frogs? Try shades of green! Working on evolution? Maybe earth tones reflecting an archeological dig. Do you have one group that you want your reader to evaluate compared to the rest? Highlight that one using a consistent color throughout the manuscript.
library(RColorBrewer)
brewer.pal(3, "Dark2") # to find out the hex colors for the palette we were using
## [1] "#1B9E77" "#D95F02" "#7570B3"
# Attach specific colors to levels of our factor "Species"
highlight_virginica <- c("grey", "grey", "#7570B3")
names(highlight_virginica) <- levels(iris$Species)
highlight_virginica
## setosa versicolor virginica
## "grey" "grey" "#7570B3"
# Focusing on comparisons to virginica
anova_comparisons <- list(
c("versicolor", "virginica"),
c("setosa", "virginica")
)
highlight_box <- iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.2, aes(color = Species)) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) +
scale_x_discrete(labels = c("I. setosa", "I. versicolor", "I. virginica")) +
scale_color_manual(values = highlight_virginica) + # setting color to our values
scale_y_continuous(expand = expand_scale(mult = c(0, .1))) + # adds a little padding to the top
theme(axis.text.x = element_text(face = "italic")) +
guides(color = FALSE) +
labs(y = "Sepal Length (cm)") +
stat_compare_means(comparisons = anova_comparisons, label = "p.signif") +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
highlight_scatter <- iris %>%
ggplot(aes(x = Petal.Width, y = Petal.Length, color = Species, shape = Species)) +
geom_point() +
labs(y = "Petal length (cm)", x = "Petal width (cm)") +
scale_color_manual(
values = highlight_virginica,
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
scale_shape_manual(
values = c(1, 15, 19),
labels = c("I. setosa", "I. versicolor", "I. virginica")
) +
theme(legend.text = element_text(face = "italic")) +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
plot_grid(highlight_box, highlight_scatter, ncol = 2, labels = c("A", "B"), rel_widths = c(1, 2), align = "h")
If you want to specify shades by name (e.g. “saddlebrown”), this “Colors in R” document is helpful (if somewhat limiting should you be searching for a very specific shade). Have a great image in your last department talk that captures your topic? Extract colors from it for your plots using Adobe Color.
Pushing the envelope on plot design can be tricky - in academic settings, unconventional or complicated plots can confuse your reader (or worse, Reviewer #3 who was just slipped decaf at Starbucks). If you want to branch out, check whether your redesign is intuitive to someone unfamiliar with your research (not your labmates!).
glimpse(starwars)
## Observations: 87
## Variables: 13
## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "L…
## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, …
## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "bro…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "lig…
## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "…
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, …
## $ gender <chr> "male", NA, NA, "male", "female", "male", "female", N…
## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaa…
## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human",…
## $ films <list> [<"Revenge of the Sith", "Return of the Jedi", "The …
## $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <…
## $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanc…
basic_bar <- starwars %>%
ggplot(aes(x = homeworld)) +
geom_bar() + # plots the total by default
theme(axis.text.x = element_text(face = "italic", angle = 45, vjust = 0.5)) +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
# A little data tidying goes a long way
a_galaxy <- starwars %>%
mutate(homeworld = replace_na(homeworld, "Unknown")) %>%
count(homeworld, name = "n_characters") %>%
filter(n_characters > 1) %>%
mutate(homeworld = reorder(homeworld, n_characters))
better_bar <- a_galaxy %>%
ggplot(aes(x = homeworld, y = n_characters)) +
geom_col() +
scale_y_continuous(expand = expand_scale(mult = c(0, .1))) + # removes space below bars, adds 10% above
coord_flip() +
labs(y = "# characters", x = "Home world") +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
not_a_bar <- a_galaxy %>%
ggplot(aes(x = homeworld, y = n_characters)) +
geom_segment(aes(x = homeworld, xend = homeworld, y = 0, yend = n_characters), color = "grey") +
geom_point(size = 3, color = "red", fill = "orange", alpha = 0.7, shape = 21, stroke = 1) +
scale_y_continuous(expand = expand_scale(mult = c(0, .1))) +
coord_flip() +
labs(y = "# characters", x = "Home world") +
theme(plot.margin = unit(c(2, 0, 0, 0), "lines"))
plot_grid(basic_bar,
better_bar,
not_a_bar,
nrow = 1,
labels = c("Desk Reject", "Resubmit", "Accept ;)"),
hjust = -0.2,
align = "h"
)
devtools::install_github("thomasp85/patchwork")devtools::install_github('bbc/bbplot')devtools::install_github('EmilHvitfeldt/paletteer')) A crazy number of color palettes for plots!